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Abstract 


Generalized  linear  models  are  widely  used  by  data  analysts.  However,  the  choice  of  the  linlc 
function,  i.e.,  the  scale  on  which  the  mean  is  linear  in  the  explanatory  variables  is  often  made 
arbitrarily  .  Here  we  permit  the  data  to  estimate  the  linlc  function  by  incorporating  it  as 
an  unknown  in  the  model.  Since  the  linlc  function  is  usually  taken  to  be  strictly  increasing, 
by  a  strictly  increasing  transformation  of  its  range  to  the  unit  interval  we  can  model  it 
as  a  strictly  increasing  cumulative  distribution  function.  The  tranafozmation  results  in  a 
domain  which  is  [0,1]  as  well.  We  model  the  cumulative  distribution  function  as  a  mixture 
of  Beta  cumulative  distribution  functions,  noting  that  the  latter  family  is  dense  within  the 
collection  of  all  continuous  densities  on  [0,1].  For  the  fitting  of  the  model  we  take  a  Bayesian 
approach,  encouraging  vague  priors,  to  focus  upon  the  likelihood.  We  discuss  choices  of  such 
priors  as  well  as  the  integrability  of  the  resultant  posteriors.  Implementation  of  the  Bayesian 
approach  is  carried  out  using  sampling  based  methods,  in  particular,  a  tailored  Metropolis- 
within- Gibbs  algorithm.  An  illustrative  example  utilising  data  involving  wave  damage  to 
cargo  ships  is  provided. 
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^  1  Introduction 


Genexalized  linear  models  have  1^  now  become  a  standard  class  of  mnWwl*  for  exploration 
within  the  data  analyst’s  tool  kit.  The  evolution  of  these  models  along  with  details  on  fitting 
them  is  provided  in  McCullagh  and  Nelder  (1989).  The  GLIM  software  for  carrying  out  the 
model  fitting  is  widely  available. 

Generalized  linear  models  have  been  advocated  as  an  advance  over  standard  linear  models 
in  that  they  allow  for  (i)  nonnormal  aampliTig  mechanisms,  (ii)  heterogeneous  variances  which 
are  capturtKl  through  the  mean-variance  ^ationship  of  the  sapling  model ,  and  (iii)  a  mean 
for  the  observations  which  need  only  be  linear  on  a  transformed  scale.  This  transformation, 
referred  to  as  the  link  function,  is  the  focus  of  the  present  paper.  That  is,  often  the  stochastic 
mechanism  for  the  observations  arises  naturally  as,  for  example,  a  binomial  or  Poisson  in 
the  case  of  count  data.  However  choice  of  the  scade  upon  which  the  transformed  mean  is 
presumed  linear  is  often  made  arbitrarily.  Informal  classical  diagonostic  tools  for  selecting 
a  link  and  for  assessing  the  adequacy  of  a  link  are  discussed  in  the  McCullagh  and  Nelder 
(1989)  drawing  upon  work  of  Pregibon  (1980)  and  Hinkley  (1985).  In  particular,  employing 
a  family  of  power  link  functions,  an  approach  in  the  spirit  of  the  Box- Tidwell  transformation 
can  suggest  an  appropriate  power.  However,  this  family  insists  upmi  a  positive  mean  and  in 
addition  may  be  too  small  within  the  class  of  strictly  monotone  functions. 

We  treat  the  link  function  as  another  unknown  in  the  generalized  linear  model  specifica¬ 
tion  and  estimate  it  jointly  with  the  mean  structure.  Our  approach  is  thus  semiparametric; 
we  assume  a  linear  parametric  form  for  the  mean  on  a  transformed  scale  where  the  trans¬ 
formation  is  expressed  nonparametrically.  Moreover,  for  a  given  linear  form  our  fitted  link 
function  may  be  compared  with  say  the  canonical  link  to  identify  shortcomings  of  the  latter. 

Our  fitting  is  within  the  Bayesian  firamework  treating  aill  model  unknowns  as  random  with 

V 

inference  proceeding  firom  the  posterior  distribution  m  these  unknowns.  If  prior  information 
is  available  say  about  coefficient  parameters  we  would  be  happy  to  use  it.  However,  interest 
usually  focusses  more  upon  the  likelihood  whence  we  would  tend  to  use  noninformative 
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prion.  Recent  advances  in  Bayesian  computation  using  sampling  baaed  ttmtKrxja  (Gdfsnd 
and  Smith,  1990;  Smith  &  Gelfand,  1993)  enable  reasonably  straight  forward  fitting  of 
such  models.  In  fact  Gibbs  sampling  was  utilised  by  Dellaportas  and  Smith  (1993)  for 
implementing  the  Bayesian  analysis  of  standard  generalized  linear  models. 

The  Dirichlet  process  prior  (Ferguson,  1973)  has  been  the  usual  tool  for  non- 

parametric  Bayesian  inference.  In  an  rmpublished  thesis,  Escobar  employs  the  Dirichlet 
process  prior  in  investigating  a  nonparametric  version  of  the  simultaneous  normal  Tni»xTm 
problem.  This  idea  has  been  extended  in  a  series  of,  as  yet  unpublished,  papers  authored 
by  Escobar,  West  and  Erkanli  amon^  others  to  include  generalized  linear  models.  In  all  of 
this  work  the  nonparametric  ^ect  of  the  modelling  is  introduced  at  the  second  stage;  the 
first  stage  specification  is  a  fully  parametric  likelihood.  No  nonparametric  estimate  of  the 
link  function  results  say  for  comparison  with  the  link  assumed  in  the  first  stage. 

Unpublished  work  of  Czado  and  Newton  introduces  a  random  liulc  into  the  likelihood 
in  the  binary  regression  problenu  They  assume  Pr(yi  »  l|*i,;9)  —G(xfj3)  where  G  is  an 
unknown  cumulative  distribution  function.  G  is  assumed  to  be  a  random  draw  from  a 
Dirichlet  process  prior  independent  of  /3.  The  base  measure  for  G  might  be  normal  or 
logistic.  Czado  and  Newton  show  that,  with  the  inclusion  of  latent  variables  G  such 
that  Pi(y{  s  l|zi,^)=  Pr(‘Ui  <  xf/3),  G  can  be  marginalised  out  and  a  straightforward 
Gibbs  sampler  arises.  Here  the  Dirichlet  process  prior  is  convenient  since  the  inverse  link  is 
a  distribution  function.  Extension  to  other  ^neralized  linear  models  is  not  obvious. 

For  an  arbitrary  generalized  linear  model  our  approach  describes  the  strictly  increasing 
link  function  g,  suitably  transformed  to  have  range  in  (0,1),  again  as  an  unknown  cumulative 
distribution  function.  In  the  process  the  resiilting  domain  also  becomes  (0,1).  We  model 
this  function  as  a  mixture  of  Beta  distribution  functions  appealing  to  the  well  known  result 
that  any  continuous  density  on  (0,1)  can  be  arbitrarily  well  approximated  by  a  discrete 
mixture  of  Beta  densities.  Unlike  distributions  arising  under  a  Dirichlet  process,  which 
could  also  be  used  here,  we  have  a  continuous,  dense  class  of  distributions  admitting  an 
explicit  form.  In  practice  we  have  treated  the  number  of  mixands  r,  as  fixed  though  a 
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discrete  prior  could  be  attempted.  We  have  experimented  with  a  range  of  r*s,  for  a  number 
of  examples,  discovering,  perhaps  not  surprisingly,  that  robustness  occurs  with  quite  small 
r.  In  introducing  randomness  to  finite  mixture  model  it  is  simpler  assume  the  mixture 
weights  to  be  random  rather  than  the  parameters  of  the  Beta  densities. 

Interesting  related  nonparametric  Bayesian  regression  work,  which  also  does  not  employ 
Diiichlet  process  priors,  includes  Blight  and  Ott  (1975),  O’Hagan  (1978),  Weerahandi  and 
Zidek  (1988)  and  Angers  and  Delampady  (1992). 

The  outline  o£  this  paper  is  thus  the  following,  hi  section  2  we  detail  our  general  approach. 
Section  3*xonsiders  non  informative' priors  appropriate  for 'our  likelihood  specification.  In 
section  4  we  describe  the  fitting  of  these  models  using  a  sampling  based  approach.  Section 
5  examines  a  data  set  taken  from  McCuUagh  and  Nelder  (1989)  where  a  Poisson  regression 
is  fit  assuming  a  canonical  link.  Allowing  an  unknown  link,  not  surprisingly,  we  obtain  an 
improved  model.  But  also,  comparison  of  the  estimated  link  with  the  canonical  link  reveals 
the  nature  of  the  shortcomings  of  the  latter.  We  conclude  with  a  brief  summary. 


2  Likelihood  form 

Recall  that  a  generalized  linear  model  assumes  a  one-parameter  exponential  family 
form  for  the  distribution  of  the  response  y,  i.e., 

/W,  s)  =  d(y,  ^)exp[{0y  -  b(fi)}/a(^)]  (1) 

is  a  density  with  respect  to  Lebesgue  measure  if  y  is  continuous,  with  respect  to  counting 
measure  if  y  is  discrete.  Here  /i  =  E(y)  =  b'(0)  and  var(y)=6^^(9)/o(^).  Furthermore, 
=  rf  =  where  z  is  a  pxl  known  vector  of  covariates,  0  v&  asL  unknown  pxl  vector 
of  coefficients  and  y  is  a  strictly  increasing  differentiable  function.  The  link  function  g,  often 
taken  to  be  the  so  called  canonical  link,  g{fi)={b’)~^{/i),  is  assumed  unknown  as  well.  In 
some  generalized  linear  models  such  as  the  binomial  and  Poisson,  a(^)  is  a  known  constant  . 
In  general  we  assume  that  a(^)=^/t/,  i.e.,  that  (j)  enters  as  an  unknown  scale  parameter  with 
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V  a  known  "sample  sise”.  Then  giTen  a  sample  of  xesponses  y«,  t  =  l,2***n  with  associated 
covariates  vectors  Xi  and  sample  sizes  Uf  the  resulting  likelihood  becomes 

=  n  -  K^O}]  (2) 

iml 

where  ui<l  Mi  =  likelihood  in  (2)  is  infinite  dimensional;  without 

further  assumptions  it  need  not  be  identifiable.  For  example,  if  x  is  a  single  continuous 
covariate,  i.e.,  m  =  +  ^x)i  then  /So  and  0\  cannot  be  identified. 

Our  inference  approach  is  Bayesian  requiring  the  specification  of  a  prior  f(j3, 4,9)-  The 
identifiability  question  firom.aJBayesiaa  point  of  view,  becomes  whether  the  data  can  inform 
about  all  of  the  unknown  parameters  in  the  model.  If  yes,  provided  a  proper  posterior  results, 
there  is  no  identifiability  problem.  If  not  and  if  /is  improper  then  the  posterior  necessarily 
is  as  well  have  an  ill-defined  Ba3reaian  model.  If  not  and  if  /is  proper  then  the  prior 

drives  the  posterior. 

The  mapping  g  is  from  the  space  of  p,  say  Q,  into  R^.  As  will  become  obvious  shortly,  it 
is  convenient  to  work  with  a  mapping  firom  into  Cl,  Suppose  T  is  a  strictly  increasing 
difFerentiable  transformation  from  Cl  into  (0,1)  with  «/(»7)~T(p"^(i7)).  Then  J  is  a  strictly 
increasing  differentiable  distribution  function.  So  modeling  the  function  g  is  equivalent  to 
modeling  an  unknown  distribution  function.  A  rich  class  of  models  may  be  created  as  follows. 
Let  ^  be  a  baseline  link  function  for  g,  perhaps  the  canonical  link,  and  let  •/o(^)=T(po  ^(17)) 
be  the  cumulative  distribution  function  associated  with  go.  Diaconis  and  Ylvisaker  (1985) 
argue  that  discrete  mixtures  of  Beta  densities  provide  a  continuous  dense  class  of  models  for 
densities  on  (0,1).  A  general  member  has  the  form 

^(«)  =  «'«-Be(u|q,  d|)  (3) 

(=1 

where  r  denotes  the  number  of  mixands,  tV(  >  0, =  1  -Se(u|c{,  di)  denotes  the  Beta 

density  in  standard  form  with  parameters,  ci  and  d(.  If  IB{u-,ciydi)  denotes  the  incomplete 
Beta  function  associated  with  Be(u|c(,d()  then  let 

Av)  =  q,  di).  (4) 

isl 
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Qearly  Jiff)  is  a  distribution  function  and  M=^~'(i7)a=  T~\J{ri))  is  readily  calculated.  Thus 
a  (6  is  and  so  given  a  set  of  yi,  z<  Vi  and  a  0  and  we  can  evaluate  the  likelihood 

(2)  directly.  Note  that  calculation  of  g(/i)  for  a  given  /i,  requires  a  clumsy  inversion  for  a 
corresponding  quantile  of  J(q)  clarifjring  the  advantage  to  mnd*»liTig  Mixtures  other 
than  Beta  could  be  used,  e.g.,  gammas  on  R*",  uniforms  on  R^. 

We  could  assume  r  unknown  but  do  not  since  in  practice  this  gains  little.  In  our  ex¬ 
perience,  inference  ,  e.g.,  estimation  of  /Xt,  prediction  of  y^,  is  very  robust  to  choice  of  r; 
mixtures  with  r=3  or  4  are  virtually  indistinguishable  from  those  with  much  larger  r.  In 
fact,  adlowing  r>n  does  not  insure  perfect  fit  since  g  is  restricted  to  be  monotone.  Given  r,  it 
is  mathematically  easier  to  assume  that  the  component  Beta  densities  are  specified  but  that 
the  weights  are  unknown.  We  choose  the  set  of  ci,  dt  to  provide  a  collection  of  Beta  densities 
which  blanket  (0,1).  In  particular  we  work  with  <k  =  A(r-l-l-l).  Hence  specification  of 
g  is  equivalent  to  specification  of  w  and  we  can  denote  (2)  by  L(0,v),  4). 

The  choice  of  T  is  not  a  modeling  issue.  In  principle  any  strictly  increasing  differentiable 
function  firom  12  to  could  be  used.  For  12  =s  we  mi^t  use  T(*)  =!e^’V(l+«^'^)>  for 
12  =  we  might  use  T(')  =*(’)/{l+(’)}i  il={0,l]  we  might  use  T(-)=s(*).  In  practice 

computatioxial  difficulties  can  be  ameliorated  by  centering  and  scaling  these  choices.  For 
example,  if  yo(/i)  =  loy/«,  So^iv)  =  c**  and  over/\mder  fiow  problems  will  arise  if  |j7i|  can 
be  large.  However  in  (4),  Jo(»7)  is  required,  not  yo  *(*?)•  ^  choose  T  so  that, 

given  Tf,  computation  of  the  composite  function,  T(yo^(^))  avoids  these  problems?  If  we 
try  T(-)  =(•)/{!  +  (•)}!  for  large  \r]i\,  within  the  accuracy  of  the  computer,  T  will  be  0 
or  1.  Consider  instead  =  *!(•)** /{I  +  ^  =  1/^-  Then 

Joiff)  —  -H  Hence  if  a  “centers"  the  rji  and  b  “scales”  them,  Jo  can  be 

computed  without  problem.  Treating  min  yi  and  max  y*  as  a  range  for  Hi  enables  a  range  for 
Tji  from  which  simple  choices  for  a  and  b  hence  ki  and  k2,  can  be  made.  There  is  no  notion 
of  a  best  choice  and,  to  keep  notation  simpler,  we  suppress  Jb^and  ib2  in  the  sequel. 
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3  Prior  specification  and  proper  posteriors 


Since  primary  interest  is  in  the  likelihood  and  since  only  occasionally  will  there  be 
useful  prior  information  we  consider  vague  specification  of  the  prior  f{fi,  w,  ^).  In  the  case  of 
generalized  liwaan-  models,  where  g  is  specified,  a  multivariate  normal  prior  for  0  is  customary 
yielding  a  flat  prior  as  the  precision  matrix  tends  to  0.  For  such  models,  assuming  <!>  is  known, 
Ibrahim  and  Laud  (1991)  consider  Jefeeys’s  prior,  the  square  root  of  the  determinant  of 
Fisher’s  information  matrix.  Let  X  be  the  nx  p  matrix  whose  rows  are  the  x^’s  and  let 
Af  be  an  nxn  diagonal  matrix  such  that  where  V(/4)=  h  (fl).  Then 

"Jeffice5^’s*^ribr  Is  pfoportiOTiar  to  Ei  our  case  to  determines  y,  so  given 

Jefireys’s  prior  is  a  specification  of  (j>),  i.e., 

mw,<f>))<x\X^MX\^f^.  (5) 

In  evaluating  (5)  we  require  /(/*<)  =  druldfii.  But  dfiifdiii  — 

From  (4)  ■ 

j=i 

with  J^(i7i)  =  {go^y {Vi)T' {go\Vi)y  Thus  dm/dfn  =  {f  {zf 0){T-^y {J{x[ 0))}  ^ 

The  prior  specification  is  completed  by  providing  /(to,  ^).  If  ^  is  intrinsically  given  we 
only  require  X^)*  Is  case  with  our  Poisson  regression  example  in  section  5.  If  not, 
we  take  J{w,tp)=  X^)X^)  since  ^  is  a  scale  parameter,  customary  choice  for  would 

be  an  inverse  gamma  density  or  a  limit  thereof. 

/(to)  is  a  distribution  on  the  r-dimensional  simplex.  If  yo  is  a  baseline  link  for  g  then 
we  might  choose  /(to)  such  that,  a  priori,  g  is  “centered”  around  go-  The  data  would  then 
revise  this  prior  in  terms  of  support  for  go~  Centering  g  around  corresponds  to  centering 
J  around  Jo»  i-c-j  from  (4),  centering  t«/B(tt;q,cii)  around  ti.  If  we  center  using  the 
TTieATi,  as  is  t3rpically  done  in  the  case  of  Dirichlet  processes,  we  obtain 

J]  E{n)IB{u;  ci,di)  =  u  (7) 

{si 
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Suppose  w  ~  Dir{‘j  1).  Thca  (7)  requires  2  cj,  4t)  —  »•  H  we  use  c»  and  di  as  in 
section  2  and  take  r  even,  expansion  of  the  terms  in  this  summation  aibout  TiQsl/2,  yields, 
to  a  first  order  approximation,  an  average  which  is  u.  We  omit  details. 

When  does  a  proper  posterior  result  under  the  Bayesian  model  w,  w,  ^)? 
The  prior  for  u;,  Dir(7l)  is  proper  if  7  >0.  Assuming  /(^)  is  proper,  if  ^  is  unknown,  we 
may  investigate  propriety  of  the  posterior  by  considering  the  integrability  of 

(8) 

If  Is  proper,  then  (8)  exists.  If  X)9|w,  <ft)  is  fiat,  integrability  follows  if  logi^  is  concave 

in  0  for  each  w  and  If  is  JefiEreys’s  prior  as  in  (5)  we  may  utilise  Theorem  2.1  of 

Ibrahim  and  Laud  (1991).  In  particular,  if  as  a  function  of  0  for  fixed  w  and  Lis  bounded 
above  and  if  for  each  observed  y, 

J  expy>~^vi{zyi  -  h{z)}]{}>\z)Yl^dz  <  00  (9) 

then  (8)  is  integrable.  Note  that  (9)  does  depend  upon  the  link  function  g  enabling  (8)  to 
be  integrable  for  all  w.  Boundedness  of  L  in  holds  under  very  general  conditions  as  in 
Bamdorfi-Nielsen,  (1977).  These  conditions  hold  for  the  usual  models  arising  under  (1). 


4  Implementing  the  Bayesian  model. 

The  posterior  distribution  of  is  proportional  to  An¬ 

alytic  investigation  of  this  p-f  (r-l)-Hl  dimensional  nonnonnalised  joint  distribution  is  infea¬ 
sible  so  we  adopt  a  sampling  based  approach  using  a  Markov  chain  Monte  Carlo  algorithm 
to  obtain  observations  essentially  from  this  posterior  distribution.  In  particular,  we  use  a 
version  of  the  Gibbs  sampler  (Gelfand  and  Smith,  1990)  updating  then  w,  then  (j>  to 
complete  one  iteration.  The  associated  nonstandardised  complete  conditional  distributions 
are  not  available  explicitly.  Prom  (2)  and  (4),  for  a  given  /0,  to  evaluate  X(^,  w,  <t>)  requires 
nr  incomplete  Beta  function  evaluations  so  making  draws  directly  from  these  distributions 
is  inconvenient.  Instead,  we  utilise  a  Metropolis-within-Gibbs  algorithm  (MuUer,  1993)  with 
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a  nxolti'miate  normal  proposal  density  for  /9,  a  univariate  normal  proposal  density  for  log^ 
and  a  Dizichlet  proposal  density  for  to.  Under  a  logit  transformation  of  to  to  r-1  dimen¬ 
sional  space  tills  last  proposal  could  be  a  multivariate  normal.  We  run  short  Metropolis 
sub-trajectories,  typically  20  to  50  iterations,  for  each  update  within  iteration  of  the 
Gibbs  sampler.  Such  trajectories  requires  only  one  new  evaluation  of  the  likelihood  to  pro¬ 
ceed  horn  the  current  step  of  the  trajectory  to  the  neict.  Starting  points  for  replications 
of  the  Gibbs  sampler  are  taken  in  the  vicinity  of  the  mayitmitn  likelihood  estimates  for  0 
under  the  baseline  link  po  and  the  associated  moments  estimator  for  ^  when  ^  is  present. 
If  we  update  w  first,  no  starting  vt  values  are  needed.  We  adaptively  improve  the  proposal 
densities^llowing  Muller’s”Stiggestions7for-t3rpically  25  iterations  after  which  we  run  five 
to  ten  parallel  Gibbs  replications  using  various  “convergence”  diagnostics  to  decide  when  to 
stop.  The  retained  output  of  the  Gibbs  sampler,  denoted  by  j  =  1,2, . . .  ,Tn 

is  approximately  a  sample  foom  the  posterior  enabling  us  to  ceury  out  any  desired  posterior 
inference. 

Matters  of  model  comparison  are  examined  in  the  Bayesian  framework  xising  predictive 
distributions.  Here  we  need  to  study  sensitivity  to  the  choice  of  number  of  mixands  and 
to  compare  the  fit  of  the  generalized  linear  model  with  unknown  link  to  the  associated 
model  using  the  baseline  link.  Under  an  improper  prior  specification  the  prior  predictive 
distribution,  as  a  function  of  yi  f  L(0,vjy^)f{0,w,4)dl3dwd4,  is  improper  hence  difficult  to 
compare  amongst  models.  We  adopt  a  cross  vaJidaiion  approach,  in  particular,  considering 
the  proper  predictive  densities  f{yi\y{i)),  x  =  1, 2, . . . , n  where  denotes  y  with  y*  removed. 
See  Gelfand,  Dey  and  Chang  (1992),  for  further  discussion  in  this  regard.  In  particular,  for 
each  i  we  obtain  a  95%  equal  tail  predictive  interval  for  yi  using  f(yi|y(i),ofc,)  and  compare  it 
y«,o6*-  A  so-called  adequacy  plot  graphs  these  intervals  along  with  yi^  vs.  i,  perhaps 
after  relabeling  the  observations  to  increasing  order.  We  also  calculate  f{yi,obt\y(i),obt)i 
conditional  predictive  ordinate.  A  large  value  indicates  agreement  between  the  observation 
and  the  model  (  Pettit  and  Young,  1990).  A  conditional  predictive  ordinate  plot  graphs 
/(yt.oS«|y(t),o6a)  i-  Using  these  diagnostics,  models  can  be  compared  at  the  observation 
level,  i.e.,  for  each  i,  i  =  1,2, ...,n.  Monte  Carlo  estimation  of  /(yi|y(i)^)  and  sampling 
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/(y«iy(t).a*«)  is  taken  up  in  Gel&nd  &  Dey  (1993).  No  details  aie  given  here. 


5  An  illustrative  example 

To  illustrate  the  modeling  of  the  previous  section  we  study  a  data  set  from  McOullagh  and 
Nelder  (1989,  p.204)  concerning  the  number  of  damage  incidents  to  cargo  vessels  caused  by 
waves.  There  are  three  qualitative  covariates  :  ship  type  (five  levels),  year  of  construction 
(four  levels)  and  period  of  operation  (two  levels).  Year  of  construction  is  taken  to  be  a 
-continuous^ covaiiate.-  Because*4he  response  is  a  count,  Poisson  regression  seems  appropriate. 
The  n=34  coimts  ranging  horn  0  to  58  with  four  bigger  than  30.  As  in  section  2,  we  assume 
an  unknown  link  of  the  form  T~\J{r}))  with  J(rf)  as  in  (4).  We  take  the  canonical  link 
1  =  ^o(m)  =  the  baseline.  Under  the  Poisson  model  ^  is  intrinsically  equal  to  one. 

The  likelihood  is  log  concave  in  $  given  vi.  We  take  f{P\w)  =  1  with  /(«;)=  Dir(l),  i.e.,  a 
flat  prior. 

Figure  1  shows  a  conditional  predictive  ordinate  plot,  where  the  i’s  axe  associated  with 
increasing  for  the  baseline  model  as  well  as  the  cases  ra3,4.  The  r  ^  and  4  models 
are  quite  similar  both  improving  upon  the  baseline  modeL  We  tried  larger  r’s  with  little  gain 
primarily  because,  regardless  of  r,  the  link  function  is  still  strictly  increasing.  The  declining 
pattern  of  the  conditional  predictive  ordinate  is  expected.  Since  f{yi\y[{)^)  is  discrete, 
when  it  is  concentrated  say  at  0  and  yt^=0,  the  conditional  predictive  ordinate  becomes 
a  substantial  point  msiss.  We  next  compare  the  r=3  model  with  the  baseline  model,  the 
latter  fitted  as  in  Dellaportas  and  Smith  (1992).  Figures  2a  and  2b  show  the  adequacy  plots 
for  these  two  models.  Better  fit  would  be  expected  for  the  r=3  model  since  it  incorporates 
two  additional  parameters  (weights).  Indeed,  17  intervals  fail  in  2a,  only  8  in  2b.  Again  if 
predicts  few  incidents,  shorter  predictive  intervals  arise. 

Finally,  to  compare  link  functions  between  the  two  models  it  is  easier  to  work  with  g~^{rj). 
We  “estimated”  g~^  hy  g~^,  the  Monte  Carlo  posterior  mean,  i.e.,  the  average  of  the  m  =2000 
link  functions  arising  from  each  of  the  tOj*.  Comparison  of  g~^  and  go~^  is  on  a  range  of  rj 
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Tallies  zoQghiy  that  over  which  the  model  is  fitted,  (gb(ywm)i^(yinM)]>  Figure  3  plots  the 
ratio  g~^/go~^  over  this  interval.  is  larger  than  for  small  17,  i.e.,  g  is  smaller  than 
Po  for  small  /t,  vice  versa  for  large  ft.  This  is  reflected  in  figure  2.  Relative  to  the  baseline 
model  the  intervals  for  the  r=3  model  axe  lower  for  small  yr,  higher  for  large  y,. 


6  Summary 

Our  approach  to  modeling  an  unknown  link  function  using  a  mixture  of  beta  densities  is  di- 
•rectlyi applicable  toother  statistical  settingsinvolving  modelkig  a  monotone  function.  These 
include  integrated  hazards  in  survival  models  and  bias  functions  in  size-biased  sampling. 
The  Bajresian  inference  framework  with  a  sampling-based  implementation  offers  a  relatively, 
straightforward  fitting  technique. 
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Figure  1:  Conditiooal  predictive  ordmate  plot  for  r^S  model  (•),  rB4  model(A)  and 
baaeline  model  (x). 

Figure  2:  2a  is  an  adequacy  plot  for  tlie  baseline  model,  2b  for  r=3  model.  95%  equal 
tail  predictive  interval  are  indicated  by  V  and  A,  observed  value  by  +• 


Figure  3;  Plot  of  g“^{ri)/so‘‘\v)  ^  ’?• 
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